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RECOVERING A SYSTEM THAT HAS EXPERIENCED A FAULT 

TECHNICAL FIELD 
The invention relates to recovery of systems that have experienced faults. 

BACKGROUND 

Improvements in technology have provided users with a wide variety of devices to 
perform various tasks. Examples of such devices include desktop computer systems, 
portable computer systems, personal digital assistants (PDAs), mobile telephones, and so 
forth. The devices are relatively sophisticated devices that include processing elements 
(e.g., microprocessors or microcontrollers) and storage devices (e.g., hard disk drives, 
dynamic random access memorys or DRAMs, and so forth). 

A typical device includes an operating system (e.g., a WINDOWS operating 
system, a UNIX operating system, a LINUX operating system, etc.) that is loaded when 
the device is started. Application software is also loaded into the device to provide useful 
functions for users. Example applications include word processing applications, 
electronic mail applications, web browsing applications, calendar and address book 
applications, and so forth. 

Despite improvements in technology, failures in various components of a device 
remains a persistent problem. When a component of a device, such as a hard disk drive, 
fails, the user may be left with an inoperational device. One option for the user is to take 
the device to a repair shop where an attempt may be made to recover the failed 
component, such as the failed hard disk drive. In some cases, data on the hard disk drives 
may be recovered so that loss of data is minimized. However, in many other cases, the 
data stored on the hard disk drive is lost, unless the user has diligently backed up the data. 

Conventionally, recovery of the failed component such as the hard disk drive is an 
arduous process that often is frustrating for the user. A need thus exists for an improved 
method and apparatus for recovering a device to an operational state after a failure has 
occurred. 
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SUMMARY 

In general, according to one embodiment, a system comprises an interface to a 
network and a first operational element to perform one or more tasks in the system. A 
storage element contains a flag to indicate if a fault has occurred with the first operational 
element. A backup device enables access to the network through the interface in response 
to the flag indicating failure of the first operational element. 

In general, according to another embodiment, a system comprises a main storage 
device, a backup storage device, and a routine executable to boot from the backup storage 
device in case of a system fault. The backup storage device enables access over a 
network to retrieve data from a network node to recover the system. 

Other features and embodiments will become apparent from the following 
description, from the claims, and from the drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 is an embodiment of a network system including a network, various nodes 

coupled to the network, and a backup storage system. 

Fig. 2 is a block diagram of components of a node of Fig. 1, in accordance with an 

embodiment. 

Fig. 3 is a flow diagram of tasks performed for a failure recovery in the node of 
Fig. 2, in accordance with an embodiment. 

DETAILED DESCRIPTION 
In the following description, numerous details are set forth to provide an 
understanding of the present invention. However, it will be understood by those skilled 
in the art that the present invention maybe practiced without these details and that 
numerous variations or modifications from the described embodiments may be possible. 

Referring to Fig. 1, a network system 10 includes a network 12 that is coupled to 
network nodes 14, 16, and 18. Examples of the nodes 14, 16, and 18 include desktop 
computer systems, portable computer systems, and other types of systems having access 
to the network 12 (over either wired or wireless connections). Examples of the network 



12 include local area networks (LANs), wide area networks (WANs), the Internet, and so 
forth. 

A backup storage system 20 accessible over the network 12 stores data to be used 
to recover nodes 14, 16, and 18 in case of a fault (such as a component experiencing an 
5 error or failure) occurring in the nodes. The data stored in the backup storage system 20 
includes user data, such as user-created documents or files, electronic mail messages, 
calendar and address book files, and so forth. The data stored in the backup storage 
system also includes software, such as operating system and application software that are 
stored and executed in each of the nodes. In one embodiment, the user data and software 
10 are stored as image data 30, 32, and 34 that correspond to nodes 14, 16, and 18, 
«• respectively. Thus, in case of a fault in node 14, the image data 30 is retrieved from the 

backup storage system 20 and communicated to the node 14, with the image data used to 
recover the node 14. Similarly, image data 32 and 34 are used to recover nodes 16 and 
18, respectively. 

*:1 5 As illustrated, the node 1 8 includes a main hard disk drive 24, a backup storage 

^ device 22, and a backup routine 26 executable in the node 18. The backup routine 26 is 
^ initially stored on the backup storage device 22 and is executable to enable the node 18 to 

access the backup storage system 20 over the network 12 in case one of several 
Z predetermined faults occurs in the node 18. Examples of such predetermined faults 
20 include failure of the hard disk drive, an unrecoverable error occurring on the hard disk 
drive, corrupted software and files associated with the software (e.g., library files, etc.), 
and so forth. The backup routine 26 and the backup storage device 22 may be 
collectively be referred to as the "backup device 25." In the illustrated embodiment, the 
backup routine 26 is a software routine loaded from the backup storage device 22 for 
25 execution on a processing element in the node 18. Alternatively, the backup device is a 
hardware component that performs backup tasks in response to detection of certain types 
of faults. 

More generally, the node 18 includes a main operational portion, which in one 
embodiment contains the main hard disk drive 24 (or some other type of storage element). 
30 The main operational portion controls operation when the node 1 8 functions normally. 
The main hard disk drive 24 stores the operating system and application software, which 
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are loaded into the node 18 to perform useful tasks. In case of some predetermined faults, 
the backup device 25 is used to enable access over the network 12 to the backup storage 
system 20 to retrieve data to recover the main operational portion of the node 18. 

The backup storage device 22 can be implemented in a number of different ways, 
5 For example, the backup storage device 22 can be a bootable mini-drive that is mounted 
inside the chassis of or on a motherboard in the node. The mini-drive can be a hard disk 
drive having a relatively small storage capacity for reduced cost. Alternatively, the mini- 
drive can be other types of non- volatile memory, such as flash memory, electrically 
erasable and programmable read-only memory (EEPROM) devices, and so forth. Instead 
10 of a separate component in the chassis of each node, the mini-drive can also be integrated 
Q onto the motherboard of the node if its size permits. Alternatively, the backup storage 

^ device 22 can be a full form factor drive. 

y The backup storage device 22 can also include a compact disk (CD) or digital 

%y video disk or digital versatile disk (DVD) drive in which a CD or DVD is loaded. The 
p[15 CD or DVD contains the necessary software to enable the node 18 to access the network 
f 12. Alternatively, the backup storage device 22 includes a partition on the main hard disk 

H* drive 24, It is likely that only one part of the hard disk drive 24 is corrupted while 
S another portion is not corrupted. The backup storage device 22 can also include other 

bootable cartridges or drives. 
20 An example of the backup routine 26 is a browser that is capable of executing on 

a processor in each node to gain access to the network 12. To avoid having to load a 
large operating system such as the WINDOWS® operating system, the browser can be a 
reduced version browser that does not need standard full-scale computer operating 
systems to run. Examples of such "mini-browsers" include browsers that run in PDAs 
25 and other handheld devices. Alternatively, mini-browsers can be designed to operate in a 
DOS operating system, a WINDOWS® CE operating system, or other "lite" operating 
systems. 

Referring to Fig. 2, an example of the node 18 (which has a similar arrangement 
as nodes 14 and 16) is illustrated. The node 18 includes a central processing unit (CPU) 
30 100 that forms the processing core of the node 18. A host bridge 102 is connected over a 
host bus to the CPU 100. The host bridge 102 is also connected to a system bus 104, 
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such as a Peripheral Component Interconnect (PCI) bus. Additionally, the host bridge 
102 contains control elements to interface a main memory 103 and a video controller 116 
that controls presentation of images on a display 1 14. The system bus 104 is connected to 
a network interface 112 that manages communications to the network 12 through a port 
110. 

Other components of the node 18 include a south bridge 123 coupled to the 
system bus 104. The south bridge 123 is in turn coupled to a disk controller 124 that is 
connected to the main disk drive 24. The disk controller 124 can also manage 
communications with a CD and/or DVD drive 126. An input/output (I/O) controller 118, 
which is connected to a floppy disk drive 120 and to a mini-drive 122, is also coupled to 
the south bridge 123. 

When the node 18 first starts up, a basic input/output system (BIOS) routine 108 
is loaded to perform boot and initialization tasks. The BIOS routine 108 is stored in a 
non-volatile memory 106, which can be a flash memory, EEPROM, and other like 
memory devices. Access to the non- volatile memory 106 is provided through the south 
bridge 123. 

The backup storage device 22 of Fig. 1 can be one or more of the following 
elements in the node 18: the mini-drive 122, the CD or DVD drive 126, the floppy drive 
120, the backup partition 130 in the main hard disk drive 24, or an additional drive like 
the main drive 24. 

Although not shown, the node also includes various layers and stacks to enable 
communications over the network 12. For example, a network stack can include a 
TCP/IP (Transmission Control Protocol/Internet Protocol) or a UDP/TP (User Datagram 
Protocol/Internet Protocol) stack. TCP is described in RFC 793, entitled "Transmission 
Control Protocol," dated September 1981; and UDP is described in RFC 768, entitled 
"User Datagram Protocol," dated August 1980. One version of IP is described in Request 
for Comments (RFC) 791, entitled "Internet Protocol," dated September 1981; and 
another version of IP is described in RFC 2460, entitled "Internet Protocol, Version 6 
(IPv6) Specification," dated December 1998. TCP and UDP are transport layers for 
managing connections over an IP network. 



Also, various services enable the communication of requests over the network 12, 
such as requests between a node and the backup storage system 20. One such service is 
the Hypertext Transport Protocol (HTTP) service, which enables requests sent from one 
network element to another and responses from the destination network element to the 
5 requesting network element. 

Referring to Fig. 3, the failure recovery process performed in one of the nodes 14, 
16, and 18 is illustrated. The operating system 134 determines (at 202) if the node has 
experienced a fault. If so, the operating system 134 sets (at 204) a fail flag 132 (in the 
main hard disk drive 24) to an active state. Alternatively, the fail flag can be stored in the 
10 non- volatile memory 106, the mini-drive 122, or another memory storage element in the 
n node. 

! j Next, either in response to a user request to restart or automatically upon detection 

of the fault, the node is rebooted (at 206). When the node starts up, the BIOS routine 108 
u is loaded to perform boot tasks. One of the tasks performed by the BIOS routine 108 is to 

;«!15 determine if the fail flag 132 has been set (at 208). If not, a normal boot process is 

performed (at 210) by the BIOS routine 108. If the fail flag 132 is set, then the BIOS 
^ routine 108 accesses (at 212) the backup storage device 22. Alternatively, instead of 
;J? automatically checking for the fail flag 132, the boot from the backup storage device 22 

: J can be performed manually by a user through the BIOS (such as by selecting the boot 

20 drive). Software on the storage device 22, including the backup routine 26, is loaded (at 
214) into the node for execution on the CPU 100. As noted above, the backup routine 26 
can be a mini-browser that enables communications over the network 12. 

The backup routine 26 presents an indication of the fault (at 216), such as 
displaying a warning on the display 1 14. The backup routine 26 then waits (at 218) for a 
25 user request to recover. If a request to recover the node is received, then the backup 
routine 26 accesses (at 220) the remote backup system 20 over the network 12. Image 
data (30, 32, or 34) is retrieved from the backup storage system 20 and downloaded (at 
222) into the node, where the image data is used to recover the node. A scan disk 
operation may be performed to determine portions of the hard disk drive that are 
30 defective. The image data can then be copied to the remaining portions of the hard disk 
drive 24 to enable normal operation of the node. 
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The various software routines or modules described herein may be executable on 
various processing elements. Such processing elements include microprocessors, 
microcontrollers, processor cards (including one or more microprocessors or 
microcontrollers), or other control or computing devices. As used here, a "controller" can 
5 refer to either hardware or software or a combination of the two. 

The storage units include one or more machine-readable storage media for storing 
data and instructions. The storage media include different forms of memory including 
semiconductor memory devices such as dynamic or static random access memories 
(DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), 
10 electrically erasable and programmable read-only memories (EEPROMs), and flash 
13 memories; magnetic disks such as fixed, floppy and removable disks; other magnetic 

media including tape; or optical media such as CDs or DVDs. Instructions that make up 
it the various software routines or modules when executed by a respective processing 

! |f element cause the corresponding node to perform programmed acts, 
pi 5 The instructions of the software routines or programs are loaded or transported 

r s into the node in one of many different ways. For example, code segments including 

M instructions stored on floppy disks, CD or DVD media, a hard disk, or transported 
Ifi through a network interface card, modem, or other interface device are loaded into the 
system and executed as corresponding software routines or modules. In the loading or 
20 transport process, data signals that are embodied in carrier waves (transmitted over 

telephone lines, network lines, wireless links, cables, and the like) communicate the code 
segments, including instructions, to the node. Such carrier waves may be in the form of 
electrical, optical, acoustical, electromagnetic, or other types of signals. 

While the invention has been disclosed with respect to a limited number of 
25 embodiments, those skilled in the art will appreciate numerous modifications and 

variations therefrom. It is intended that the appended claims cover all such modifications 
and variations as fall within the true spirit and scope of the invention. 
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What is claimed is: 



1 LA system comprising: 

2 an interface to a network; 

3 a first operational element to perform one or more tasks in the system; 

4 a storage element containing a flag to indicate if a fault has occurred with 

5 the first operational element; and 

6 a backup device to enable access of the network through the interface in 

7 response to the flag indicating failure of the first operational element. 

1 2. The system of claim 1, wherein the first operational element comprises a 

2 disk drive. 

1 3. The system of claim 1 , wherein the backup device comprises a backup 

2 storage element containing a backup routine adapted to perform communications through 

3 the interface to the network. 

1 4. The system of claim 3, wherein the backup routine comprises a browser. 

1 5. The system of claim 3, wherein the first operational element comprises a 

2 first disk drive, and wherein the backup storage element comprises a second disk drive 

3 separate from the first disk drive. 

1 6. The system of claim 5, wherein the second disk drive has a smaller storage 

2 capacity than the first disk drive. 

1 7. The system of claim 1 , wherein the backup storage element comprises 

2 non-volatile memory. 

1 8. The system of claim 1, wherein the first operational element comprises a 

2 disk drive having plural partitions, and wherein the backup storage element comprises 

3 one of the partitions. 
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1 9. The system of claim 1, wherein the backup storage element comprises a 

2 removable disk drive. 

1 10. The system of claim 1 , the backup device to retrieve user data and 

2 software over the network to recover the system. 

1 11. The system of claim 1 , wherein the first operational element comprises a 

2 storage element, the backup device to retrieve an image of the storage element to recover 

3 the storage element to its operational state. 

1 1 2. A method of performing error recovery in a system, comprising: 

2 detecting if an operating portion of the system has experienced a fault; 

3 accessing a backup device to enable communication over a network; and 

4 retrieving data to recover the system over the network. 

1 13. The method of claim 12, further comprising loading a backup software 

2 routine from the backup device. 

1 14. The method of claim 13, wherein the backup software routine comprises a 

2 browser, the method further comprising executing the browser to access the network to 

3 retrieve the data. 

1 15. The method of claim 1 3 , further comprising executing the backup software 

2 routine to access the network. 

1 1 6. The method of claim 12, wherein retrieving the data comprises retrieving 

2 the data from a backup storage system coupled to the network. 
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1 1 7. An article comprising at least one storage medium containing instructions 

2 that when executed cause a system to: 

3 detect if an operating portion of the system has experienced a fault; 

4 access a backup device to enable communication over a network; and 

5 retrieve data to recover the system over the network. 

1 1 8. A method of performing recovery in a system having a main storage 

2 device and a backup storage device, comprising: 

3 booting from a backup storage device instead of the main storage device if 

4 the system has experienced a fault; and 

5 using the backup storage device to enable communications over a network 

6 to retrieve data to recover the system. 

1 19. The method of claim 1 8, further comprising loading a routine from the 

2 backup storage device to enable the network communication. 

1 20. The method of claim 19, wherein loading the routine comprises loading a 

2 browser. 

1 2 1 . A system comprising: 

2 a main storage device; 

3 a backup storage device; and 

4 a routine executable to boot from the backup storage device in case of a 

5 system fault, 

6 the backup storage device enabling access over a network to retrieve data 

7 from a network node to recover the system. 

1 22. The system of claim 2 1 , wherein the backup storage device comprises a 

2 network access routine that is loadable for execution in the system, the network access 

3 routine to enable access over the network. 
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23. The system of claim 21, wherein the routine comprises a BIOS routine. 
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ABSTRACT OF THE DISCLOSURE 
A method and system of recovering a system that has experienced a fault includes 
a backup device to enable access of a network through the interface in response to the 
fault. The system includes a main operational portion that controls operation of the 
system under normal conditions. However, if a fault occurs, then the backup device can 
be selected to take over control of the system so that data can be retrieved from a backup 
storage to recover the system. The backup device includes software and/or hardware 
components to enable the system to access a network even though the main operational 
portion may not be functioning properly. 
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Corey McGowan 



ELECTION OF POWER OF ATTORNEY 



The undersigned, being Assignee of the entire interest in the above- 
identified application by virtue of an Assignment filed herewith, hereby elects 
under 37 C.F.R. § 3.71 to prosecute the application to the exclusion of the 
inventor. 

The Assignee hereby revokes any previous Powers of Attorney and 
appoints: Dan C. Hu, Reg. No. 40,025, Fred G. Pruner, Jr., Reg. No. 40,779; 
Timothy N. Trap, Reg. No. 28,994; and Ruben S. Bains, Reg. No. 46,532, my 
patent attorneys, of TROP, PRUNER & HU, P.C., with offices located at 8554 Katy 
Freeway, Suite 100, Houston, TX 77024, telephone (713) 468-8880; and Hoyt A 
Fleming, III, Reg. No. 41,752, Paul A. Revis, Reg. No. 45,040, and Steven P. 
Arnold, Reg. No. 33,354 my patent attorneys, of MICRON ELECTRONICS, INC. 
with full power of substitution and revocation, to prosecute this application, to 
make alterations and amendments therein, to transact all business in the Patent 
and Trademark Office connected therewith, to receive any Letters Patent, and for 
one year after issuance of such Letters Patent to file any request for a certificate 
of correction that may be deemed appropriate. 

Pursuant to 37 C.F.R. § 3.73, the undersigned duly authorized designee of 
Assignee certifies that the evidentiary documents have been reviewed, 
specifically the Assignment to MICRON ELECTRONICS, INC., referenced below, 
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MICE-0091-US (00.00902) 



and certifies that to the best of my knowledge and belief, title remains in the 
name of the Assignee. 

Send correspondence to Dan C. Hu of TROP, PRUNER & HU, P.C., 8554 
Katy Freeway, Suite 100, Houston, TX 77024 and direct telephone calls to Dan C. 
Hu at (713) 468-8880. 




Paul A. Revis 

Intellectual Property Counsel 
Micron Electronics, Inc 



